MoeaBench v0.8.0 Technical Calibration Report

Scientific Performance Audit and Convergence Metrics

1. Methodology & Experimental Context

This report serves as the official scientific audit for MoeaBench v0.8.0. The objective is to validate and calibrate the numerical integrity and topological fidelity of the framework's core algorithms against established mathematical benchmarks (Ground Truth).

Experimental Setup

2. Metric Glossary & Interpretation

3. Clinical Quality Matrix: Interpretation

Quality Score Scale ($[0, 1]$): This matrix employs a High-is-Better scale.
1.00 (Optimal): Performance is statistically indistinguishable from a perfectly uniform sampling of the Ground Truth.
0.00 (Failure): Performance is no better than random sampling within the objective space manifold.

3.1. Numerical Definition & Certification

The Quality Score $Q$ is calculated by normalizing raw metrics against two strict baselines: the Optimal Uniform Sampling ($U_{ref}$) and a Random Sampling ($R_{ref}$) of the Ground Truth.

Normalization Formula:
$$ Q(m) = 1.0 - \text{clip}\left( \frac{m_{obs} - m_{optimal}}{m_{random} - m_{optimal}}, 0, 1 \right) $$
Where $m_{optimal}$ is the median metric of 30 theoretical uniform sets, and $m_{random}$ is the median of 30 random sets.

Certification Terciles: The final verdict is determined by the minimum quality across all 5 dimensions (Weakest Link Principle).

  • T-conv (Stabilization): The generation where the algorithm reaches a stable state (within 5% of its final IGD value).
  • Time (s): Average wall-clock execution time per run on the reference hardware.
  • Scientific Note: The Discretization Effect & Negative H_diff
    In cases of near-perfect convergence, you may observe an H_rel exceeding 100%.

    DTLZ1 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.30763.1689e-01 ± 1.4e+001.6218e-02 ± 9.9e-030.06041.31850.990699.09%117.98Gen 500
    NSGA20.04932.4803e-01 ± 1.5e+004.3729e-02 ± 2.3e-010.05541.33101.0000100.02%87.18Gen 300
    NSGA30.02981.4113e-02 ± 2.0e-032.7772e-03 ± 9.2e-030.05431.26420.949899.73%13.09Gen 200

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD0.950.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA20.960.360.900.930.96Low Cov30INDUSTRY
    NSGA30.960.000.840.000.69Low Cov, Irregular30FAIL

    DTLZ2 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.03941.8089e-02 ± 8.2e-048.2981e-03 ± 8.8e-030.09910.77390.581597.11%118.74Gen 100
    NSGA20.03121.8039e-02 ± 8.3e-041.6873e-02 ± 9.1e-030.09290.83110.624497.46%80.07Gen 100
    NSGA30.03981.6911e-02 ± 1.4e-047.1919e-03 ± 8.7e-030.09910.76630.575796.90%13.27Gen 100

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD0.940.000.660.000.58Low Cov, Low Density, Irregular, Imbalanced30FAIL
    NSGA20.940.180.860.930.88Low Cov30FAIL
    NSGA30.950.000.660.000.62Low Cov, Low Density, Irregular, Imbalanced30FAIL

    DTLZ3 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.54021.6522e+00 ± 1.1e+002.1422e-01 ± 2.8e-022.23931.28290.963996.56%112.82Gen 1000
    NSGA20.59892.3090e+00 ± 1.4e+011.5829e-01 ± 8.7e-010.09751.33101.0000100.00%84.55Gen 400
    NSGA30.41985.4060e-01 ± 1.7e+001.6181e-01 ± 5.1e-010.10871.33090.999999.99%12.43Gen 600

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD0.000.000.500.000.14Poor Fit, Low Cov, Low Density, Irregular, Imbalanced30FAIL
    NSGA20.940.350.900.980.93Low Cov30INDUSTRY
    NSGA30.940.080.750.260.76Low Cov, Irregular30FAIL

    DTLZ4 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.03491.8882e-02 ± 1.3e-031.3461e-02 ± 7.7e-030.08070.77050.578997.98%112.80Gen 100
    NSGA20.03261.7482e-02 ± 6.3e-041.8044e-02 ± 8.7e-030.10920.83140.624797.26%77.83Gen 100
    NSGA30.03981.6920e-02 ± 4.3e-047.2102e-03 ± 8.7e-030.10850.77180.579997.06%12.55Gen 100

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD0.940.230.790.820.86Low Cov30FAIL
    NSGA20.940.120.850.930.94Low Cov30FAIL
    NSGA30.940.000.670.000.63Low Cov, Low Density, Irregular, Imbalanced30FAIL

    DTLZ5 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.00686.5874e-04 ± 3.8e-045.2277e-03 ± 8.3e-030.04760.42060.316098.91%111.16Gen 100
    NSGA20.00197.9038e-04 ± 1.9e-041.7806e-03 ± 1.3e-030.03170.26910.202299.65%82.41Gen 300
    NSGA30.00384.9039e-04 ± 3.3e-052.6192e-03 ± 2.2e-030.09510.26920.202399.16%11.83Gen 100

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD1.000.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced30FAIL
    NSGA21.000.870.991.000.86Optimal30RESEARCH
    NSGA31.000.860.910.900.91Optimal30RESEARCH

    DTLZ6 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.05608.9077e-02 ± 2.6e-021.5905e-02 ± 1.2e-020.07730.62510.469693.81%112.39Gen 200
    NSGA20.15862.0389e-01 ± 5.9e-011.0132e-02 ± 1.7e-020.05071.26650.951699.07%76.78Gen 400
    NSGA30.19302.4915e-01 ± 6.0e-011.6877e-02 ± 1.8e-020.16091.24590.936098.39%14.64Gen 600

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD0.850.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced30FAIL
    NSGA20.920.000.000.000.15Low Cov, Low Density, Irregular, Imbalanced30FAIL
    NSGA30.820.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced30FAIL

    DTLZ7 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.09282.2038e-02 ± 1.7e-030.0000e+00 ± 0.0e+000.52070.00000.00000.00%10.00Gen 200
    NSGA20.03402.0875e-02 ± 3.9e-033.4321e-02 ± 4.3e-030.08070.68850.517396.53%76.05Gen 200
    NSGA30.11282.2767e-02 ± 6.6e-030.0000e+00 ± 0.0e+000.07430.00000.00000.00%10.00Gen 300

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD0.810.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA20.910.000.000.060.00Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA30.920.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced5FAIL

    DTLZ8 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    NSGA2840.60498.4035e+02 ± 1.0e+031.1544e-03 ± 2.2e-030.29810.79900.600360.03%30.89Gen 500
    NSGA30.04901.6853e-02 ± 9.9e-041.7490e-02 ± 1.7e-030.01870.97240.730694.99%15.26Gen 500

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    NSGA20.170.000.000.000.00Poor Fit, Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA30.940.000.000.080.00Low Cov, Low Density, Irregular, Imbalanced5FAIL

    DTLZ9 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    NSGA21730.85551.7308e+03 ± 0.0e+000.0000e+00 ± 0.0e+00999.30980.00100.00080.08%12.75Gen 100
    NSGA30.20676.6958e-02 ± 6.7e-038.3061e-03 ± 1.0e-030.21840.14610.109854.21%15.08Gen 500

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    NSGA20.000.000.000.000.00Poor Fit, Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA30.790.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced5FAIL

    DPF1 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.34793.1650e-01 ± 9.8e-015.1073e-03 ± 2.4e-030.07731.21750.914794.40%117.58Gen 500
    NSGA20.00512.3259e-02 ± 1.4e-011.3109e-03 ± 3.0e-030.00391.17200.880599.83%82.37Gen 400
    NSGA30.09379.7682e-02 ± 4.5e-011.6700e-03 ± 1.3e-030.04211.20420.904897.41%12.23Gen 400

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD1.000.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA21.000.970.970.941.00Optimal30RESEARCH
    NSGA31.000.510.580.910.62Low Cov, Low Density, Imbalanced30INDUSTRY

    DPF2 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.90194.0960e-05 ± 4.5e-073.5453e-03 ± 8.4e-061.75200.19210.144349.90%115.59Gen 100
    NSGA20.00171.2828e-04 ± 2.0e-051.0080e-02 ± 8.1e-030.26120.38410.288699.72%72.52Gen 100
    NSGA30.00311.2220e-04 ± 1.5e-051.4596e-02 ± 1.2e-020.54470.38440.288899.54%13.38Gen 200

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD1.000.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA21.000.960.980.921.00Optimal30RESEARCH
    NSGA31.000.930.960.900.97Optimal30RESEARCH

    DPF3 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.02382.7892e-03 ± 8.8e-044.2399e-03 ± 9.4e-030.00000.77610.583197.65%115.10Gen -
    NSGA20.00163.0564e-03 ± 1.2e-042.0067e-03 ± 1.4e-030.28310.79450.5969100.22%75.93Gen 800
    NSGA30.00423.1938e-03 ± 1.6e-043.0014e-03 ± 2.3e-030.32420.79820.599799.99%12.23Gen 100

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD0.990.000.000.060.61Low Cov, Low Density, Irregular, Imbalanced30FAIL
    NSGA20.990.810.890.840.88Optimal30RESEARCH
    NSGA30.990.650.770.820.82Low Cov30INDUSTRY

    DPF4 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.50251.4101e-01 ± 7.8e-018.9064e-04 ± 1.6e-030.12430.69490.522153.27%121.37Gen 300
    NSGA20.00442.4544e-02 ± 1.5e-013.3861e-03 ± 3.4e-030.01281.16320.873999.64%82.92Gen 200
    NSGA30.02331.8991e-02 ± 1.2e-011.3051e-02 ± 4.9e-020.04261.30610.981399.68%12.23Gen 200

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD1.000.000.000.000.00Low Cov, Low Density, Irregular, Imbalanced5FAIL
    NSGA21.000.980.980.921.00Optimal30RESEARCH
    NSGA31.000.580.370.540.24Low Cov, Low Density, Irregular, Imbalanced30FAIL

    DPF5 Benchmark Analysis

    AlgorithmIGD (Mean ± Std)GD (Mean ± Std)SP (Mean ± Std)EMD (Wasserstein)H_rawH_ratioH_relTime(s)Stabil.
    MOEAD0.03046.7792e-03 ± 1.4e-041.6212e-02 ± 2.0e-020.11300.70660.530996.80%113.39Gen 100
    NSGA20.01429.1868e-03 ± 4.9e-032.0535e-02 ± 8.9e-030.10941.10820.832699.13%81.79Gen 100
    NSGA30.02396.2888e-03 ± 1.8e-041.8744e-02 ± 1.0e-020.20930.78850.592497.87%12.51Gen 200

    Visual Semantics: Filled points mark solutions close to the Ground Truth, while hollow markers highlight points that remain far from the GT surface.

    Clinical Quality Matrix

    Clinical certification is aggregated over all available standard_runXX files for each algorithm. The scores represent normalized Quality: 1.0 (Optimal) to 0.0 (Random).

    AlgorithmFITCOVERAGEDENSITYREGULARITYBALANCESUMMARYRUNSCERTIFICATION
    MOEAD1.001.000.000.000.59Low Density, Irregular, Imbalanced30FAIL
    NSGA21.000.011.000.850.92Low Cov30FAIL
    NSGA31.001.000.750.470.89Irregular30INDUSTRY